Testing for Significance of Increased Correlation with Human Judgment
نویسندگان
چکیده
Automatic metrics are widely used in machine translation as a substitute for human assessment. With the introduction of any new metric comes the question of just how well that metric mimics human assessment of translation quality. This is often measured by correlation with human judgment. Significance tests are generally not used to establish whether improvements over existing methods such as BLEU are statistically significant or have occurred simply by chance, however. In this paper, we introduce a significance test for comparing correlations of two metrics, along with an open-source implementation of the test. When applied to a range of metrics across seven language pairs, tests show that for a high proportion of metrics, there is insufficient evidence to conclude significant improvement over BLEU.
منابع مشابه
Graham, Yvette and Timothy Baldwin (to appear) Testing for Significance of Increased Correlation with Human Judgment, In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP 2014), Doha, Qatar
Automatic metrics are widely used in machine translation as a substitute for human assessment. With the introduction of any new metric comes the question of just how well that metric mimics human assessment of translation quality. This is often measured by correlation with human judgment. Significance tests are generally not used to establish whether improvements over existing methods such as B...
متن کاملThe Correlation of Machine Translation Evaluation Metrics with Human Judgement on Persian Language
Machine Translation Evaluation Metrics (MTEMs) are the central core of Machine Translation (MT) engines as they are developed based on frequent evaluation. Although MTEMs are widespread today, their validity and quality for many languages is still under question. The aim of this research study was to examine the validity and assess the quality of MTEMs from Lexical Similarity set on machine tra...
متن کاملEvaluation of Extracellular Circulating Human MicroRNA-197 as a Target Biomarker in Patients with Coronary Artery Disease
Background: Coronary Artery Disease (CAD) refers to the reduction or blockage of all or part of the coronary arteries due to the process of atherosclerosis or the presence of a clot. The aim of this study was to investigate the association of serum miR-197 as a diagnostic index in patients with coronary artery disease. Methods: In this study, 100 patients with CAD were selected. Extraction of...
متن کاملThe Comparison of human judgment, help- seeking and social acceptability in students with and without dyslexia
The present study was conducted to the comparison of human judgment, help-seeking and social acceptability in students with and without in dyslexia. The research method was a causal comparison of post-event type. The statistical population of the study consisted of all students with particular reading disabilities in the primary school of Rasht in the first half of the academic year 2017-2018. ...
متن کاملرابطه جو توانمندسازی با ادراک توانمندی کارکنان در بیمارستانهای آموزشی کرمان
Introductions: Stability and consistency of hospitals’ activities depends more than ever before, on making competitive performances, which to achieve them comparing to other competitors could demonstration better qualities performances. Nowadays the most improvement tools of richness and price value of hospitals are known as experienced personnel, or on the other hand, source of the powerful hu...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014